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We develop a general method to quantify the uncertainties of parton distribution func- 
tions and their physical predictions, with emphasis on incorporating all relevant experimental 
constraints. The method uses the Hessian formalism to study an effective chi-squared func- 
tion that quantifies the fit between theory and experiment. Key ingredients are a recently 
developed iterative procedure to calculate the Hessian matrix in the difficult global analysis 
environment, and the use of parameters defined as components along appropriately normal- 
ized eigenvectors. The result is a set of 2d Eigenvector Basis parton distributions (where 
c? 16 is the number of parton parameters) from which the uncertainty on any physical 
quantity due to the uncertainty in parton distributions can be calculated. We illustrate the 
method by applying it to calculate uncertainties of gluon and quark distribution functions, 
W boson rapidity distributions, and the correlation between W and Z production cross 
sections. 
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1 Introduction 



The partonic structure of hadrons plays a fundamental role in elementary particle physics. In- 
terpreting experimental data according to the Standard Model (SM), precision measurement 
of SM parameters, and searches for signals of physics beyond the SM, all rely on the parton 
picture of hadronic beam particles that follows from the factorization theorem of Quantum 
Chromodynamics (QCD). The parton distribution functions (PDFs) are nonperturbative — 
and hence at present uncalculable — functions of momentum fraction x at a low momentum 
transfer scale Qo- They are determined phenomenologically by a global analysis of exper- 
imental data from a wide range of hard-scattering processes, using perturbative QCD to 
calculate the hard scattering and to determine the dependence of the PDFs on Q by the 
renormalization-group based evolution equations. 

Considerable progress has been made in several parallel efforts to improve our knowledge 
of the PDFs 1^, 1^ , but many problems remain open. In the conventional approach, specific 
PDF sets are constructed to represent the "best estimate" under various input assumptions, 
including selective variations of some of the parameters |5|, 0. From these results, how- 
ever, it is impossible to reliably assess the uncertainties of the PDFs or, more importantly, 
of the physics predictions based on them. The need to quantify the uncertainties for pre- 
cision SM studies and New Physics searches in the next generation of collider experiments 
has stimulated much interest in developing new approaches to this problem |]^, ^. Several 
attempts to quantify the uncertainties of PDFs in a systematic manner have been made 
recently g 0, [n|, 

The task is difficult because of the diverse sources of experimental and theoretical un- 
certainty in the global QCD analysis. In principle, the natural framework for studying 
uncertainties is that of the likelihood function |12, 14, 15|. If all experimental measurements 



were available in the form of mutually compatible probability functions for candidate theory 
models, then the combined likelihood function would provide the probability distribution 
for the possible PDFs that enter the theory. From this, all physical predictions and their 
uncertainties would follow. Unfortunately, such ideal likelihood functions are rarely avail- 
able from real experiments. To begin with, most published data sets used in global analysis 
provide only effective errors in uncorrelated form, along with a single overall normalization 
uncertainty. Secondly, published errors for some well-established experiments appear to fail 
standard statistical tests, e.g., the P^r degree of freedom may deviate significantly from 
1.0, making the data set quite "improbable". In addition, when the few experiments that are 
individually amenable to likelihood analysis are examined together, they appear to demand 
mutually incompatible PDF parameters. A related problem is that the theoretical errors 
are surely highly correlated and by definition poorly known. All these facts of life make the 
idealistic approach impractical for a real-world global QCD analysis. 

The problems that arise in combining a large number of diverse experimental and the- 
oretical inputs with uncertain or inconsistent errors are similar to the problems routinely 
faced in analyzing systematic errors within experiments, and in averaging data from mea- 
surements that are marginally compatible [0. Imperfections of data sets in the form of 
unknown systematic errors or unusual fiuctuations — or both — are a common occurrence. 
They need not necessarily destroy the value of those data sets in a global analysis; but we 
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must adapt and expand the statistical tools we use to analyze the data, guided by reasonable 
physics judgement. 

In this paper we develop a systematic procedure to study the uncertainties of PDFs 
and their physics predictions, while incorporating all the experimental constraints used in 
the previous CTEQ analysis flj. An effective function, called Xgiobai; used not only 
to extract the "best fit", but also to explore the neighborhood of the global minimum in 
order to quantify the uncertainties, as is done in the classic Error Matrix approach. Two key 
ingredients make this possible: (i) a recently established iterative procedure Jl^ that yields 
a reliable calculation of the Hessian matrix in the complex global analysis environment; and 
(ii) the use of appropriately normalized eigenvectors to greatly improve the accuracy and 
utility of the analysis. 

The Hessian approach is based on a quadratic approximation to Xgiobai neighbor- 
hood of the minimum that defines the best fit. It yields a set of PDFs associated with the 
eigenvectors of the Hessian, which characterize the PDF parameter space in the neighbor- 
hood of the global minimum in a process-independent way. In a companion paper, referred 
to here as LMM []T^, we present a complementary pro cess- dependent method that studies 
Xgiobai a function of whatever specific physical variable is of interest. That approach is 



based on the Lagrange Multiplier method [T^], which does not require a quadratic approx- 



imation to Xgiobai; hence is more robust; but, being focused on a single variable (or a 
few variables in a generalized formulation), it does not provide complete information about 
the neighborhood of the minimum. We use the LM method to verify the reliability of the 
Hessian calculations, as discussed in Sec. |[ Further tests of the quadratic approximation 
are described in Appendix ^ 

The outline of the paper is as follows. In Sec. ^ we summarize the global analysis 
that underlies the study, and define the function Xgiobai ^^^^ plays the leading role. In 
Sec. ^ we in explore the quality of fit in the neighborhood of the minimum. We derive the 
Eigenvector Basis sets, and show how they can be used to calculate the uncertainty on any 
quantity that depends on the parton distributions. In Sec. § we apply the formalism to derive 
uncertainties of the PDF parameters and of the PDFs themselves. In Sec. ^we illustrate the 
method further by finding the uncertainties on predictions for the rapidity distribution of W 
production, and the correlation between W and Z production cross sections. We summarize 
our results in Sec. Two appendices provide details on the estimate of overall tolerance for 
the effective Xgiobai function, and on the validity of the quadratic approximation inherent in 
the Hessian method. Two further appendices supply explicit tables of the coefficients that 
define the best fit and the Eigenvector Basis sets. The mathematical methods used here 
have been described in detail in [O . Some preliminary results have also appeared in [M H] . 



2 Global QCD Analysis and Effective Chi-squared 

Global analysis is a practical and effective way to combine a large number of experimental 
constraints. In this section, we describe the main features of the global QCD analysis, and 
explain how we quantify its uncertainties through the behavior of Xgiobai- 
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2.1 Experimental and theoretical inputs 

We use the same experimental input as the CTEQ5 analysis 15 data sets on neutral- 
current and charged- current deep inelastic scattering (DIS), Drell-Yan lepton pair production 
(DY), forward/backward lepton asymmetry from W production, and high px inclusive jets, 
as listed in Table |I| of Appendix 0. The total number of data points is 1295 after cuts, 
such as Q > 2 GeV and W > 4GeV in DIS, designed to reduce the influence of power-law 
suppressed corrections and other sources of theoretical error. The experimental precision and 
the information available on systematic errors vary widely among the experiments, which 
presents difficulties for the effort to quantify the uncertainties of the results. 

The theory input is next-to-leading-order (NLO) perturbative QCD. The theory has sys- 
tematic uncertainties due to uncalculated higher-order QCD corrections, including possible 
resummation effects near kinematic boundaries; power-suppressed corrections; and nuclear 
effects in neutrino data on heavy targets. These uncertainties — even more than the experi- 
mental ones — are difficult to quantify. 

The theory contains free parameters {a-i} = {ai, . . . , a^} defined below that characterize 
the nonperturbative input to the analysis. Fitting theory to experiment determines these 
{oj} and thereby the PDFs. The uncertainty of the result due to experimental and theoretical 
errors is assessed in our analysis by an assumption on the permissible range of Ax^ for the 
fit, which is discussed in Sec. 

2.2 Parametrization of PDFs 

The PDFs are specified in a parametrized form at a fixed low energy scale Qq, which we 
choose to be 1 GeV. The PDFs at all higher Q are determined from these by the NLO 
perturbative QCD evolution equations. The functional forms we use are 

fix, Qo) = A^x^'{l- xf' (1 + A3 x""') (1) 

with independent parameters for parton fiavor combinations = u — u, dy = d ~ d, g, and 
u + d . We assume s = s = 0.2 [u + d) a.t Qq. A somewhat different parametrization for the 
d/u ratio is adopted to better fit the current data: 

d{x, Qo)/u{x, Qo) = Bo (1 - x)^^ + (1 + B^x) (1 - x)^^ . (2) 

The specific functional forms are not crucial, as long as the parametrization is fiexible enough 
to include the behavior of the true — but unknown — PDFs at the level of accuracy to which 
they can be determined. The parametrization should also provide additional flexibility to 
model the degrees of freedom that remain indeterminate. On the other hand, too much 
flexibility in the parametrization leaves some parameters poorly determined at the minimum 
of x^- To avoid that problem, some parameters in the present study were frozen at particular 
values. 

The number of free parameters has increased over the years, as the accuracy and diversity 
of the global data set has gradually improved. A useful feature of the Hessian approach is 
the feedback it provides to aid in reflning the parametrization, as we discuss in Sec. [4.1| . 



4 



The current analysis uses a total of = 16 independent parameters, referred to generically 
as {tti}. Their best-fit values, together with the fixed ones, are listed in Table ^ of Appendix 
0. (Some of the fit parameters are defined by simple functions of their related PDF 
shape parameters Ai or Bi, as indicated in the table, to keep their relevant magnitudes in 
a similar range, or to enforce positivity of the input PDFs, etc.) The set of fit parameters 
{ai} could also include parameters associated with correlated experimental errors, such as 
an unknown normalization error that is common to all of the data points in a particular 
experiment; however, such parameters were kept fixed for simplicity in this initial study. 
The QCD couphng was similarly fixed at as{Mz) = 0.118 . 



2.3 Effective chi-squared function 

Our analysis is based on an effective global chi-squared function that measures the quality 
of the fit between theory and experiment: 

Xglobal = ^WnXi (3) 
n 

where n labels the 15 different data sets. 

The weight factors in (Q), with default value 1, are a generalization of the selection 
process that must begin any global analysis, where one decides which data sets to include 
{w = 1) or exclude {w = 0). For instance, we include neutrino DIS data (because it con- 
tains crucial constraints on the PDFs, although it requires model-dependent nuclear target 
corrections); but we exclude direct photon data (which would help to constrain the gluon 
distribution, but suffers from delicate sensitivity to k± effects from multiple soft gluon emis- 
sion). The Wn can be used to emphasize particular experiments that provide unique physical 
information, or to de-emphasize experiments when there are reasons to suspect unquantified 
theoretical or experimental systematic errors (e.g. in comparison to similar experiments). 
Subjectivity such as this choice of weights is not covered by Gaussian statistics, but is a 
part of the more general Bayesian approach; and is in spirit a familiar aspect of estimat- 
ing systematic errors within an experiment, or in averaging experimental results that are 
marginally consistent. 

The generic form for the individual contributions in Eq. (H) is 



2 _j2(^£i±ziky (4) 



where T„/, Dni, and (T„/ are the theory value, data value, and uncertainty for data point / of 
data set (or "experiment") n. In practice, Eq. (Q) is generalized to include correlated errors 
such as overall normalization factors; or even the full experimental error correlation matrix 
if it is available [B^ 



The value of Xgiobai depends on the PDF set, which we denote by S. We stress that 
Xgiobai('S') is an "effective x^"? whose purpose is to measure how well the data are fit by 
the theory when the PDFs are defined by the parameter set {ai{S)}. We use Xgiobai('S') to 
study how the quality of fit varies with the PDF parameters; but we do not assign a priori 
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statistical significance to specific values of it — e.g.^ in the manner that would be appropriate 
to an ideal chi-squared distribution — since the experimental and theoretical inputs are often 
far from being ideal, as discussed earlier. 



2.4 Global minimum and its neighborhood 

Having specified the effective function, we find the parameter set that minimizes it to 
obtain a "best estimate" of the true PDFs. This PDF set is denoted by 5*0 .0 The parameter 
values that characterize 5*0 are listed in Table |^ of Appendix |C[ 

To study the uncertainties, we must explore the variation of Xgiobai neighborhood 
of its minimum, rather than focusing only on 5*0 as has been done in the past. Moving the 
parameters away from the minimum increases Xgiobai ^y an amount ^X^^ioi^ai- It is natural to 
define the relevant neighborhood of the global minimum as 

AXglobal < (5) 

where T is a tolerance parameter. The Hessian formalism developed in the next section 
(Sec. ^ provides a reliable and efficient method of calculating the variation of all predictions 
of PDFs in this neighborhood, as long as T is within the range where a quadratic expansion 
of Xgiobai terms of the PDF parameters is adequate. 

In order to quantify the uncertainties of physical predictions that depend on PDFs, 
one must choose the tolerance parameter T to correspond to the region of "acceptable fits" . 
Broadly speaking, the order of magnitude of T for our choice of Xgiobai is already suggested by 
self-consistency considerations: Our fundamental assumption — that the 15 data sets used in 
the global analysis are individually acceptable and mutually compatible, in spite of departures 
from ideal statistical expectations exhibited by some of the individual data sets, as well as 
signs of incompatibility between some of them if the errors are interpreted according to strict 
statistical rules ||12| — must, in this effective approach, imply a value of T substantially 
larger than that of ideal expectations. More quantitatively, estimates of T have been carried 
out in the companion paper LMM |18[, based on the comparison of ^x\\ohai with detailed 



studies of experimental constraints on specific physical quantities. The estimates of T will 
be discussed more extensively in Sec. ^, where applications are presented, and in Appendix 
^ For the development of the formalism in the next section, it suffices to know that (i) the 
order of magnitude of these estimates is 

10 to 15; (6) 

and (a) the master formulas given in Sec. show that all uncertainties are proportional to 
T, so that results derived for a particular value of T can easily be scaled to other estimates 
of T if desired. 



^It is very similar to the CTEQ5M1 set with minor differences arising from the improved parametriza- 
tion (|) for d/u. 
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3 The Hessian formalism 



The most efficient approach to studying uncertainties in a global analysis of data is through 
a quadratic expansion of the function about its global minimum.^ This is the well known 
Error Matrix or Hessian method. Although the method is standard, its application to PDF 
analysis has, so far, been hindered by technical problems created by the complexity of the 
theoretical and experimental inputs. Those technical problems have recently been overcome 
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2-dim (i,j) rendition ofd-dim (-16) PDF parameter space 

contours of constant gigy^i 
vif. eigenvector in the l-direction 
p(/): point of largest a,- with tolerance T 
So- global minimum 

diagonalization and 

rescaling by ^ 
the iterative method 

' Hessian eigenvector basis sets 




Original parameter basis 



(b) 

Orthonormal eigenvector basis 



Figure 1: Illustration of the basic ideas of our implementation of the Hessian method. 



An iterative procedure [0] diagonalizes the Hessian matrix and rescales the eigenvectors to 
adapt the step sizes to their natural scale. The solid points represent the resulting eigenvector 



basis PDFs described in Sec. |3]^. Point p(i) is explained in Sec. |0 . 



The Hessian matrix is the matrix of second derivatives of at the minimum. In our 
implementation, the eigenvectors of the Hessian matrix play a central role. They are used 
both for accurate evaluation of the Hessian itself, via the iterative method of Jl^, and to 
produce an Eigenvector Basis set of PDFs from which uncertainties of all physical observables 
can be calculated. These basis PDFs provide an optimized representation of the parameter 
space in the neighborhood of the minimum x^. 

The general idea of our approach is illustrated conceptually in Fig. |1|. Every PDF set 
S corresponds to a point in the (i-dimensional PDF parameter space. It can be specified by 
the original parton shape parameters {ai{S)} defined in Sec. pT^ , as illustrated in Fig. |I](a); 
or by the Eigenvector Basis coordinates {zk{S)}, which specify the components of S along 
the Eigenvector Basis PDFs that will be introduced in Sec. |3.3| , as illustrated in Fig. |T|(b). 
The solid points in both (a) and (b) represent the basis PDF sets. 



''The Lagrange multiplier method [ p^ 
proximation. 
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is a complementary approach that avoids the quadratic ap- 
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3.1 Quadratic approximation and the Hessian matrix 

The standard error matrix approach begins with a Taylor series expansion of Xgiobai('S') 
around its minimum 5*0, keeping only the leading terms. This produces a quadratic form in 
the displacements from the minimum: 

d d 

Ax' = - Xo' = E E - («^- - «?) (7) 

i=i j=i 

where Xo = X^i^o) is the value at the minimum, {a°} = {aj^So)} is its location, and 
{aj} = {aj{S)}. We have dropped the subscript "global" on ior simplicity. We also 
suppress the PDF argument (S) in x^ and {aj} here and elsewhere for brevity. 

The Hessian matrix Hij has a complete set of orthonormal eigenvectors Vik defined by 

d 

E Hij Vjk = ek Vik (8) 

d 

E = , (9) 

i=l 

where {e^} are the eigenvalues and 6ik is the unit matrix. Displacements from the minimum 
are conveniently expressed in terms of the eigenvectors by 

d 

fli - a° = E ' ^^^^ 

k=l 

where scale factors Sk are introduced to normalize the new parameters Zk such that 

Ax' = E ■ (11) 
fe=i 

With this normalization, the relevant neighborhood of the global minimum corresponds 
to the interior of a hypersphere of radius T: 

E < ■ (12) 

k=l 

In the ideal quadratic approximation, the scale factors Sk would be equal to ^/l/tk ■ However, 
if x^ is not perfectly quadratic, then Sk differs somewhat from ^JxJTk^ as explained in 
Appendix ^. 

The transformation ( pUj ) is illustrated conceptually in Fig. |^, where {fc, label two of 
the eigenvector directions. One of the eigenvectors is shown both in the original parameter 
basis and in the normalized eigenvector basis. 



8 



3.2 Eigenvalues of the Hessian matrix 



The square of the distance in parameter space from the minimum of is 

d d 



E 



1=1 



(13) 



k=l 



by (|9|)-([T0|). Because Sk ~ \/^l^k-, an eigenvector with large eigenvalue therefore corre- 
sponds to a "steep direction" in space, z.e., a direction in which rises rapidly, making 
the parameters tightly constrained by the data. The opposite is an eigenvector with small 
efc, which corresponds to a "shallow direction", for which the criterion Ax^ < permits 
considerable motion — as is the case for illustrated in Fig. |l|. 

The distribution of eigenvalues, ordered from largest to smallest, is shown in Fig. ^ 
Interestingly, the distribution is approximately linear in log e . The eigenvalues span an 
enormous range, which is understandable because the large global data set includes powerful 
constraints — particularly on combinations of parameters that control the quark distributions 
at moderate x — leading to steep directions; while free parameters have purposely been added 
to ([l|)-(@) to the point where some of them are at the edge of being unconstrained by the 
data, leading to shallow directions. 
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Figure 2: Distribution of eigenvalues of the Hessian matrix for fits using d = 13, 16 
(standard), and 18 free PDF parameters. 

Fig. also shows how the range of eigenvalues expands or contracts if the number of 
adjustable parameters is changed: the 16-parameter fit is the standard one used in most 
of this paper; the 18-parameter fit is defined by allowing A"" 7^ Af"-' and A?^ ^ with 
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= 6; the 13-parameter fit is defined by A^" = Af = 1 and Af^^ = 0. The range 
spanned by the eigenvalues increases with the dimension d of the parameter space (roughly 
as log(ei/ed) cx 

The large (10^:1) range spanned by the eigenvalues makes the smaller eigenvalues and 
their eigenvectors very sensitive to fine details of the Hessian matrix, making it difficult to 
compute Hij with sufficient accuracy. This technical problem hindered the use of Hessian 
or Error Matrix methods in global QCD analysis in the past. The problem has been tamed 
by an iterative method introduced in , which computes the eigenvalues and eigenvectors 
by successive approximations that converge even in the presence of numerical noise and 
non-quadratic contributions to -S 

The Hessian method relies on the quadratic approximation to the effective func- 
tion. We have extensively tested this approximation in the neighborhood of interest by 
comparing it with the exact Xgiobai- The results are satisfactory, as shown in Appendix |B|, 
which also explains how the approximation is improved by adjusting the scale factors for 
the shallow directions. 



3.3 PDF Eigenvector Basis sets Sf 

The k^^ eigenvector of the Hessian matrix has component Vik along the i*^ direction in the 
original parameter space, according to (||). Thus Vik is the orthogonal matrix that transforms 
between the original parameter basis and the eigenvector basis. For our application, it is 
more convenient to work with coordinates {zk} that are normalized by the scale factors {sk} 
of (piUD, rather than the "raw" coordinates of the eigenvector basis. Thus we use the matrix 

Mik = VikSk (14) 

rather than Vik itself. Mj^ defines the transformation between the two descriptions that are 
depicted conceptually in Fig. |l|: 

d 

«i - «° = 5Z ■ (^^) 

k=l 

It contains information about the physics in the global fit, together with information related 
to the choice of parametrization, and is a good object to study for insight into how the 
parametrization might be improved, as we discuss in Sec. |4.1| . 

The eigenvectors provide an optimized orthonormal basis in the PDF parameter space, 
which leads to a simple parametrization of the parton distributions in the neighborhood of 
the global minimum So- In the remainder of this section, we show how to construct these 
Eigenvector Basis PDFs {5"^, i = 1, . . . , c?} ; and in the following subsection, we show how 
they can be used to calculate the uncertainty of any desired variable X{S). 

The Eigenvector Basis sets Sf are defined by displacements of a standard magnitude t 
"up" or "down" along each of the d eigenvector directions. Their coordinates in the z-basis 

'^The iterative algorithm is implemented as an extension to the widely used CERNLIB program MINUIT 
. The code is available at http://www.pa.msu.edu/~pumplin/iterate/. 
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are thus 

zu{Sf) = ±t5M. (16) 

More exphcitly, is defined by (2:1, ... , Zd) = {t,0, . . . , 0), etc. We make displacements in 
both directions along each eigenvector to improve accuracy; which direction is called "up" is 
totally arbitrary. As a practical matter, we choose t = 5 for the displacement distance. This 
choice improves the accuracy of the quadratic approximation by working with displacements 
that have about the same size as those needed in applications.^ 

The {oj} parameters that specify the Eigenvector Basis sets Sf" are given by 

a,{S^) - a'l = ±tM,, (17) 

by (0)-(|l|). Hence 

a,{Si) - ai{Si) = 2tMie. (18) 

Interpreted as a difference equation, this shows directly that the element Ma of the trans- 
formation matrix is equal to the gradient of parameter Oj along the direction of z^]^ 

Basis PDF sets along two of the eigenvector directions are illustrated conceptually in 
Fig. |1| (a) and (b) as solid points displaced from the global minimum set Sq. The coefficients 
that specify all of the sets are listed in Table § of Appendix 0. 



3.4 Master equations for calculating uncertainties using the Eigen- 
vector Basis sets {Sf} 

Let X{S) be any variable that depends on the PDFs. It can be a physical quantity such 
as the W production cross section; or a particular PDF such as the gluon distribution at 
specific X and Q values; or even one of the PDF parameters Oj. All of these cases will be 
discussed as examples in Sections H and |^. 

The best-fit estimate for X is X^ = X{So). To find the uncertainty, it is only necessary 
to evaluate X for each of the 2d sets {Sf}. The gradient of X in the ^-representation can 
then be calculated, using a linear approximation that is essential to the Hessian method, by 

dX ^ X{St)-X{S^) 

dzk 2t ^ ^ 

where t is the scale used to define {Sf} in (|16]). It is useful to define 

D,{X) = X(5+)-X(5,) (20) 

D{X) = l^mX)]'] (21) 

Dk{X) = Dk{X)/D{X), (22) 



'^The value chosen for t is somewhat smaher than the typical T given in (^) because in applications, the 
component of displacement along a given eigenvector direction is generally considerably smaller than the 
total displacement. 

^Technically, we calculate the orthogonal matrix Vij using displacements that give Ax^ — 5 where the 
iterative procedure [ p7[ converges well. The eigenvectors arc then scaled up by an amount that is adjusted 
to make Ax^ = 25 exactly for each to improve the quadratic approximation. 
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so that Dk[X) is a vector in the gradient direction and Dk{X) is the unit vector in that 
direction. 

The gradient direction is the direction in which X varies most rapidly, so the largest 
variations in X permitted by ( |T^ ) are obtained by displacement from the global minimum 
5*0 in the gradient direction by a distance ±T. Hence 



AX 



k=l 



(23) 



From this, using (p!9D-(p2D, we obtain the master equation for calculating uncertainties. 




(24) 



This equation is applied to obtain numerical results in Sections ^ and 

For applications, it is often important also to construct the PDF sets and that 
achieve the extreme values X = X° ± AX. Their 2;-coordinates are 

Zk{S^) = ±TDk{X) , (25) 

which follows from the derivation of (p^. Their physical parameters {oj} then follow from 
Eqs.([l|) and (|18D: 



^ 5,(X) [a,(5^ 



ai{S^ )] 



k=l 



(26) 



In practice, we calculate the parameters for Sx and Sx by applying (p6D directly to the 
parton shape parameters IuAq", A"",. .. listed in Table ^, except that the normalization 
factors Aq", Aq", and Aq are computed from the momentum sum rule and quark number 
sum rules 



/ X fi{x) dx = 1 
Jo 



'0 

Uv{x) dx = 2, / dy{x) dx 
'0 Jo 

to ensure that those sum rules are satisfied exactly. 



(27) 



(28) 



4 Uncertainties of Parton Distributions 

4.1 Uncertainties of the PDF parameters {a^} 

As a useful as well as illustrative application of the general formalism, let us find the uncer- 
tainties on the physical PDF parameters a^. We only need to follow the steps of Sec. p^ . 
Letting X = for a particular i, Eqs. (|20|) and (^) give 



Dk{ai) = a,(S+) - ai{S^) = 2t Mik • (29) 
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The uncertainty on in the global analysis follows from the master equation (^^: 

^«^ = ^(E^^^] • (30) 

The parameter sets {djiaf)} and {aj{a^)} that produce the extreme values of a, can 
be found using PB|). In the conceptual Fig. ^ the parton distribution set with the largest 
value of ttj for Axgi^j^^^j = is depicted as point p(i). 

The uncertainties {Aoj} of the standard parameter set, calculated from (^) with T = 5 
are listed along with the central values {a°} in Table ^. To test the quadratic approximation, 
asymmetric errors are also listed. These are defined by displacements in the gradient direction 
(p9|) that are adjusted to make Ax^ exactly equal to = 25. They agree quite well with 
the errors calculated using (pO]), which shows that the quadratic approximation is adequate 
for our purposes. 

Table H also lists the components of the displacement vectors Zk{a^) of (^5]) , which have 
been renormalized to ^ 2;^ = 1. These reveal which features of the PDFs are governed most 
strongly by specific eigenvector directions. The table is divided into sections according to 
the various flavor combinations that are parametrized. One can see for example that the 
flattest direction ziq is strongly related to the gluon parameters Af and , confirming that 
the gluon distribution at Qo = 1 GeV is a highly uncertain aspect of the PDFs. The second- 
flattest direction zi^ relates mainly to the d/u ratio, as seen by the large components along 

2i5 for Bq^^ and Bf^^. Meanwhile, the steepest direction zi mainly influences the valence 
quark distribution, via A^" . 

All of the parametrized aspects of the PDFs at Qo, namely u^, dy, g, d + u, and d/u 
receive substantial contributions from the four flattest directions 13-16, which shows that 
the current global data set could not support the extraction of much finer detail in the PDFs. 
This can be confirmed by noting that the error ranges of the individual parameters are 
not small. 



4.2 Uncertainties of the PDFs 

The uncertainty range of the PDFs themselves can also be explored using the eigenvector 
method. For example, letting the gluon distribution g{x, Q) at some specific values of x and 
Q be the variable X that is extremized by the method of Sec. |3.4| leads to the extreme gluon 
distributions shown in the left-hand side of Fig. 0. The envelope of such curves, obtained 
by extremizing at a variety of x values at fixed Q, is shown by the shaded region, which is 
defined by T = 10, i.e., by allowing Xgiobai above its minimum value. 

The right-hand side of Fig. ^ similarly shows the allowed region and two specific cases 
for the M-quark distribution. The uncertainty is much smaller than for the gluon, reflecting 
the large amount of experimental data included in the global analysis that is sensitive to the 
u quark through the square of its electric charge. 

The dependence on x in these figures is plotted as a function of x^/^ to better display 
the region of current experimental interest. The values are weighted by a factor x^/^, which 
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Figure 3: Two extreme gluon distributions (left) and w-quark distributions (right) for 
Q = 2GeV {long dash) and (5 = 100GeV {short dash) with T = 10. Each curve is calculated 
to give the minimum or maximum value for some particular x. The entire allowed region, 
which is the envelope of all such curves, is shaded. 



makes the area under each curve proportional to its contribution to the momentum sum rule. 
Note that the uncertainty decreases markedly with increasing Q as a result of evolution. Also 
note that the gluon distribution is large and fairly well determined at smaller x values and 
large Q — the region that will be vital for physics at the LHC 

Figure ^ displays similar information for Q = 10 GeV, expressed as the fractional un- 
certainty as a function of logo;. It shows that the gluon distribution becomes very uncertain 
at large x, e.g., a; > 0.25. (At x > 0.6, where the distribution is extremely small, the lower 
envelope of fractional uncertainty begins to rise. This is an artifact of the parametrization 
with = 0; e.g., making the parametrization more flexible by freeing with = 6 leads 
to a broader allowed range indicated by the dotted curves.) 

The boundaries of the allowed regions for the PDFs are not themselves possible shapes 
for the PDFs, since if a particular distribution is extremely high at one value of x, it will 
be low at other values. This can be seen most clearly in the gluon distributions of Figs. ^ 
and ^, where the extreme PDFs shown push the envelope on the high side in one region of 
X, and on the low side in another. 



5 Uncertainties of Physical Predictions 

In applying the Hessian method to study the uncertainties of physical observables due to 
PDFs, it is important to understand how the predictions depend on the tolerance parameter 
T, and how well T can be determined. We discuss these issues first, and then proceed to 
illustrate the utility of this method by examining the predictions for the rapidity distribution 



14 



0.5 ^ 
10" 



10" 



I 

10~ 



10° 



0.8 ^ 
10" 



10" 



10" 



10" 



Figure 4: Ratio of gluon (left) and w-quark (right) distributions to Best Fit Sq at Q 
10 GeV. 



of W and Z boson production as well as the correlation of W and Z cross-sections in pp 
collisions. 

First, note that the uncertainties of all predictions are linearly dependent on the tol- 
erance parameter T in the Hessian approach, by the master formula (p^ ; hence they are 
easily scalable. The appropriate value of T is determined, in principle, by the region of 
"acceptable fits" or "reasonable agreement with the global data sets" in the PDF parameter 
space. Physical quantities calculated from PDF sets within this region will range over the 
values that can be considered "likely" predictions. The language used here is intentionally 
imprecise, because as discussed in the introductory sections, the experimental and theoreti- 
cal input to the function Xgiobai global analysis — in particular the unknown systematic 
errors reflected in apparent abnormalities of some reported experimental errors as well as 
incompatibilities between some data sets — makes it difficult to assign an unambiguous value 



to T. However, as mentioned in Sec. |2.4| , self-consistency considerations inherent in our basic 
assumption that the 15 data sets used in the global analysis are acceptable and compatible, 
in conjunction with the detailed comparison to experiment conducted in LMM using 
the same Xgiobai function, yield a best estimate of T 10 to 15 (Eq. Details of these 
considerations are discussed in Appendix 0. 

Of the estimates of T described there, the most quantitative one is based on the algo- 
rithm of LMM to combine 90% confidence level error bands from the 15 individual data 
sets for any specific physical variable such as the total production cross section oiWoiZ 
at the Tevatron or LHC [Cf. Appendix A for a summary, and LMM [^| for the detailed 
analysis.) For the case of cr^y^, the uncertainty according to the specific algorithm is ±4%, 
corresponding to T ~ 13. With our working hypothesis T 10 - 15, the range of the 
uncertainty of a^^^ will be ±3.3% to ±4.9%. 

The numerical results on applications presented in the following subsections are obtained 
with the same choice of T as in Sec. ^, i.e. T = 10. Bearing in mind the linear dependence of 
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the uncertainties on T, one can easily scale these up by the appropriate factor if a more con- 
servative estimate for the uncertainty of any of the physical quantities is desired. We should 
also note that the experimental data sets used in this analysis are continuously evolving. 
Some data sets (cf. Table |] in Appendix A) will soon be updated (HI, ZEUS) or replaced 
(CCFR)J] In addition, theoretical uncertainties have yet to be systematically studied and 
incorporated. Therefore, the specific results presented in this paper should be considered 
more as a demonstration of the method rather than definitive predictions. The latter will 
be refined as new and better inputs become available. 

5.1 Rapidity distribution for W production 

Figure ^ shows the predicted rapidity distribution da/dy for production in pp collisions 
at ^/s = 1.8 TeV. The cross section is not symmetric in y because of the strong contribution 
from the valence w-quark in the proton — indeed, the forward/backward asymmetry produces 
an observable asymmetry in the distribution of leptons from W decay, which provides an 
important handle on flavor ratios in the current global analysis. 




Figure 5: Left: Predicted rapidity distribution for pp — > + X at ^/s = 1.8 TeV. The 
curves are extreme predictions for the integrated cross section a (solid), or the rapidity 
moments (y) {long dash), or (y^) {short dash). Right: same except the Best Fit prediction 
is subtracted to show the details better. 

The left-hand side of Fig. |^ shows the six rapidity distributions that give the extremes 
(up or down) of the integrated cross section a = J ^dy, the first moment (y), or the 
second moment (y^), as calculated using the Hessian formalism for T = 10. To show the 
differences more clearly, the right-hand side shows the difference between each of these 
rapidity distributions and the Best Fit distribution. 

^Cf. Talks presented by these collaborations at DIS2000 Workshop on Deep Inelastic Scattering and 
Related Topics, Liverpool, England, April 2000. 
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Figure 6: Comparison with Lagrange Multiplier method: Three of the six curves from Fig. ^ 
(corresponding to maximum o", (y), or (y^)) are shown, together with the result of the exact 
LM method for AxLbai = 100. 



Figure ^ shows three of the same difference curves as in Fig. |^ along with results ob- 



tained using the Lagrange Multiplier method of LMM |]T8|. The good agreement shows that 
the Hessian formalism, with its quadratic approximation (|^), works well at least for this 
application. 

Figure ^ shows the same three curves from Fig. ^ together with 6 random choices of the 
PDFs with AXgijj(jg^j = 100. These random sets were obtained by choosing random directions 
in {zi} space and displacing the parameters from the minimum in those directions until 
Xgiobai increased by 100. Note that none of this small number of random sets give good 
approximations to the three extreme curves. This is not really surprising, since the extrema 
are produced by displacements in specific (gradient) directions; and in 16- dimensional space, 
the component of a random unit vector along any specific direction is likely to be small. But 
it indicates that producing large numbers of random sets would at best be an inefficient way 
to unearth the extreme behaviors. 
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Figure 7: Comparison with random methods: Three of the six curves from Fig. ^ (corre- 
sponding to maximum a, (y), or (y^) for A^gj^j^g^j = 100) are shown, together with results 
from PDF sets that are obtained by displacement in 6 random directions of {zi} space by 
AXgiobai = 100. 
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5.2 Correlation between W and Z cross sections 



One can ask what are the error hmits on two quantities X and Y simultaneously, according 
to the AXgjjjjjj^j < criterion. In the Hessian approximation, the boundary of the allowed 
region is an ellipse The ellipse can be expressed elegantly in a "Lissajous figure" form 

X = X° + AX sm{e + (f)) 

Y = y° + AF sin(^) , (31) 

where < 6' < 27r traces out the boundary. The shape of the ellipse is governed by the phase 
angle 0, which is given by the dot product between the gradient vectors for X and Y in {zk} 
space: 

d 

cos(P = Y, MX) Dk{Y) , (32) 

k=l 

where -Dj(X) and Di{Y) are defined by (p^). 

As an example of this, T = 10 error limits for and production at the Tevatron 
are shown in Fig. p. The error limits on the separate predictions for these cross sections are 
each about 3.3% for T= 10. The predictions are strongly correlated (cos0 = 0.60), in part 
because the same quark distributions — in different combinations — are responsible for both 
W and Z production, and in part because the uncertainties of all the quark distributions 
are negatively correlated with the more uncertain gluon distribution, and hence positively 
correlated with each other. 

The W and Z cross sections from CDF (dashed) and D0 (dotted) are also shown in 
Fig. H |31|]. (The measured quantities aw ■ Bw^ev and az ■ Bz^e+e- were converted to aw 



and az using world average values for the branching ratios |jT6[ ; the measured CDF and D0 
branching ratios for W agree with the world average to within about 1%.) The data points 
are shown in the form of error bars defined by combining statistical and systematic errors 
(including the errors in decay branching ratios) in quadrature. The errors in these measure- 
ments are also highly correlated, in part through the uncertainty in overall luminosity which 
both cross sections are proportional to — so the experimental points would also be better 
represented by ellipses. The two experiments in fact use different assumptions for the inelas- 
tic pp cross section which measures the luminosity: CDF uses its own measurement, while 
D0 uses the world average. The dot-dashed data point shows the result of reinterpreting 
the CDF point by scaling the luminosity down by a factor 1.062 to correspond to the world 
average pp cross section 



6 Summary and Concluding Remarks 

Experience over the past two decades has shown that minimizing a suitably defined Xgiobai 
an effective way to extract parton distribution functions consistent with experimental con- 
straints in the PQCD framework. The goal of this paper has been to expand the analysis to 
make quantitative estimates of the uncertainties of PDFs and their predictions, by examin- 
ing the behavior of Xgiobai neighborhood of the minimum. The techniques developed 
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Figure 8: Error ellipse for predicted W and Z boson production in pp collisions at 1.8 TeV. 
The error limit (T = 10) of the prediction is the interior of the ellipse. Error bars show 
data from D0 (dotted) and CDF (dashed). The dot-dash error bars show the result of 
reinterpreting the CDF data by using the same assumption for luminosity as D0 [Rll] . 



in Ref. allow us to apply the traditional error matrix approach reliably in the global 
analysis environment. The eigenvectors of the Hessian (inverse of the error matrix) play a 
crucial role, both in the adaptive procedure to accurately calculate the Hessian itself, and 
in the derivation of the master formula (p^ for determining the uncertainties of parton 
distributions and their predictions. 

Our principal results are: (i) the formalism developed in Sec. O, leading to the master 
formulas; and (ii) the Best Fit parton distribution set Sq plus the 2d Eigenvector Basis 
sets presented in Sec. ^]3|, which are used in apphcations of the master formula. The 
uncertainties are proportional to T, the tolerance parameter for Axgiohai- We present several 
estimates, based on current experimental and theoretical input, that suggest T is in the range 
10 - 15. It is important to note, however, that this estimate can, and should, be refined in 
the near future. First, several important data sets used in the global analysis will soon be 
updated or replaced. Secondly, there are other sources of uncertainties which have yet to be 
studied and included in the analysis in a full evaluation of uncertainties. (The work of Botje 
| T0| describes possible ways to incorporate some of these.) 

This paper, focusing on the presentation of a new formalism and its utility, represents 
the first step in a long-term project to investigate the uncertainties of predictions dependent 
upon parton distributions. We plan to perform a series of studies on processes in precision 
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SM measurements (such as the W mass) and in new physics searches (such as the Higgs 
production cross section), which are sensitive to the parton distributions. 
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A Estimates of the Tolerance Parameter for A^gj^^g^i 

This appendix provides details of the various approaches mentioned in Sec. and Sec. ^ 
to estimate the tolerance parameter T defined in Eq. (^). In our global analysis based on 
^Xgiobai' uncertainties of predictions of the PDFs according to the master formula Eq. ( ^i]) 
are directly proportional to the value of T. 

The first two estimates rest on considerations of self-consistency which are required by 
our basic assumption that the 15 data sets used in the global analysis (see Table P are 
acceptable and mutually compatible — in spite of the departure from ideal statistical expecta- 
tions exhibited within many of the individual data sets, as well as apparent incompatibility 
between experiments when the errors are interpreted according to strict statistical rules | ]12| |. 
A third estimate follows from the analyses in our companion paper LMM ||18[. Based on 
these three estimates, we adopt the range T f« 10 to 15 as our working hypothesis, as was 
quoted in Eq. (||), and used in Sees. | and || to obtain the numerical results shown in the 
plots. 

Finally, it is of interest to compare these estimates of the tolerance parameter with 
the traditional — although by now generally recognized as questionable — gauge provided by 
differences between published PDFs. 

1. Tolerance required by acceptability of the experiments: One can examine 
how well the best fit 5*0 agrees with the individual data sets, by comparing Xn in Eq. (|) 
with the range iV„ ± a/2A^„ that would be the expected 1 a range if the errors were ideal. 
The largest deviations are found to lie well outside that range: Xn~-^n (a/2A^„) = 65.5 (17.7), 
-64.8(18.5), 65.1 (19.3), -25.9(15.4), 22.4(8.1), for experiments n = 2,3,4,10,15 respec- 
tively. By attributing the "abnormal" x^'s to unknown systematic errors or unusual fiuctua- 
tions (or both), and accepting them in the definition of Xgiobai ^'^^ global analysis, we must 
anticipate a tolerance for the latter which is larger than that for an "ideal" x^-function. ( Cf. 



Appendix A of LMM [|18[ for a quantitative discussion of the increase in T due to neglected 
systematic errors.) Since the sources of the deviation of these real experimental errors from 
ideal expectations are not known, it is not possible to derive specific values for the overall 
tolerance. However, the sizes of the above quoted deviations (which, in each case, imply 
a very improbable fit to any theory model, according to ideal statistics) suggest that the 
required tolerance value for the overall AXgi^^j^^^j (involving 1300 data points) must be rather 
large. 

2. Tolerance required by mutual compatibility of the experiments: We can 

quantify the degree of compatibility among the 15 data sets by removing each one of them 
in turn from the analysis, and observing how much the total for the remaining 14 sets 
can be lowered by readjusting the {oj}. This is equivalent to minimizing x^ for each possible 
14-experiment subset of the data, and then asking how much increase in the x^ for those 14 
experiments is necessary to accommodate the return of the removed set. These increases are 
listed as A„ in Table |l]. They range up to ^20. In other words, we have implicitly assumed 
that when a new experiment requires an increase of 20 in the Xgiobai of ^ plausible global 
data set, that new experiment is nevertheless sufficiently consistent with the global set that 
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Expt n 


Process 


Nn 


Name Rcf. 


A„ 


1 


DIS F2(up) 


168 


BCDMS 


IE 




19.7 


2 


DIS F2(^d) 
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m 




4.5 
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DIS -F2(ep) 
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HI 


m 




3.7 
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9.7 


5 
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l^L^r It 
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8.9 
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m 
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10 


V)-Y pp 
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E605 


m 




6.4 


11 


D-Y pd/pp 


1 


NA51 


m 




0.5 


12 


D-Y pd/pp 


11 


E866 


m 




0.6 


13 


^lept. asym. 


11 


CDF 


m 




15.1 


14 


^ jet X 


24 


D0 






3.4 


15 


pp — » jet X 


33 


CDF 






3.7 



Table 1: Data sets used in the global analysis. If experiment n is omitted, A„ denotes the 
amount by which for the remaining 14 experiments can be reduced by readjusting the fit 
parameters. 

it can be included as an equal partner.| Hence the value of must be substantially larger 
than 20. 

3. Tolerance calculated from confidence levels of individual experiments: In 



18| , we examine how the quality of fit to each of the 15 individual experiments varies as 
a function of the predicted value for various specific observable quantities such as aw or 
az- The fit parameters {oj} are continuously adjusted by the Lagrange Multiplier method 
to yield the minimum possible value of Xgiobai given values of the chosen observable. 
The constrained fits obtained this way, interpreted as "alternative hypotheses" in statistical 
analysis, are then compared to each of the 15 data sets to obtain a 90% confidence level 
error range for the individual experiments. Finally, these errors are combined with a definite 
algorithm to provide a quantifiable uncertainty measure for the cross section. In the case of 
the W production cross section at the Tevatron, a^^, this procedure yields an uncertainty of 
±4%, which translates into a value of ~ 180 for Axgiobab or T ^ 13. This method is definite, 
but it is, in principle, process-dependent. However, when the same analysis is applied to 
(tJ*^^, cr^*-^ and cr^^'^ (which probe different directions in the PDF parameter space), we find 
AXgiobai be consistently in the same range as for cr^^, even though the percentage errors 
on the cross section vary from 4% at the Tevatron to 10% at LHC 

4. Comparison of tolerance figures to differences between published PDFs: 

Table lists the value obtained when our Xgiobai computed using various current and 
historical PDF sets. The Ax^ column lists the increase over the CTEQ5M1 set. Typical 
values for the modern sets are similar to the range 100 - 225 that corresponds to T = 10 - 15. 
For previous generations of PDF sets, Xgiobai much larger — not surprisingly, because the 
obsolete sets were extracted from much less accurate data, and without some of the physical 



^Since 5 or 6 of the experiments require A„ in the range of 10 to 20, this level of inconsistency is not 
caused by problems with just one particular experiment — which would simply invite the permanent removal 
of that experiment from the analysis. 
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processes such as W decay lepton asymmetry and inclusive jet production. 



Current sets 


Historical sets 


CTEQ5M1 1188 
CTEQ5HJ 1272 84 
MRST99 1297 109 
MRST-a i 1356 168 
MRST-a t 1531 343 


CTEQ4M 1540 352 
MRSR2 1680 492 
MRSRl 1758 570 

CTEQ3M 2254 1066 
MRSA' 3371 2183 



Table 2: Overall Xgiobai values and their increments above the best fit value, for some current 
and historical parton distribution sets. 



B Tests of the quadratic approximation 

The Hessian method relies on a quadratic approximation (^) to the effective function in 
the relevant neighborhood of its minimum. To test this approximation, Fig. |^ shows the 
dependence of along a representative sample of the eigenvector directions. The steep 
directions 1 and 4 are indistinguishable from the ideal quadratic curve Ax^ = ■ The 
shallower directions 7, 10, 13, 16, are represented fairly well by that parabola, although 
they exhibit noticeable cubic and higher-order effects. The agreement at small z is not 
perfect because we adjust the scale factors Sk in (p!0| ) (see footnote |) to improve the average 
agreement over the important region z ^ 5, rather than defining the matrix Hij in (|^) strictly 
by the second derivatives at ^ = 0. For this reason, the scale factors Sk in (|1^) are somewhat 
different from the \/l/ek suggested by the Taylor series: the flattest directions are extremely 
flat only over very small intervals in so it would be misleading to represent them solely by 
their curvature at 2; = 0. 

Figure [1^ shows the dependence of along some random directions in {zi} space. 
The behavior is reasonably close to the ideal quadratic curve Ax^ = z^, implying that the 
quadratic approximation (|^ is adequate. In particular, the approximation gives the range 
of z permitted by T = 10 to an accuracy of ~ 30%. Since the tolerance parameter T used 
to make the uncertainty estimates is known only to perhaps 50%, this level of accuracy is 
sufficient. 
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Figure 9: Variation of with distance along representative eigenvector directions 1, 4, 7, 
10, 13, 16. The first two, shown by sohd curves, are nearly indistinguishable from each other 
and from the ideal A^^ = . The remaining four, shown by clashed curves with increas- 
ing dash length denoting increasing eigenvector number, demonstrate that the quadratic 
approximation is adequate, though imperfect. 




Figure 10: Variation of with distance along 10 randomly chosen directions in {zi\ space. 
The dependence is represented acceptably well by the quadratic approximation A^^ = z^, 
which is shown as the dotted curve. 
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C Table of Best Fit 

Table ^ lists the parameter values that define the "best fit" PDF set 5*0 which minimizes 
Xgiobai- ^Iso lists the uncertainties (for T = 5) in those parameters. 

For each of the d=16 parameters, Table |^ also lists the components of a unit vector 
Zi, . . . , in the eigenvector basis. That unit vector gives the direction for which the param- 
eter varies most rapidly with Xgiobai) direction along which the parameter reaches 
its extreme values for a given increase in Xgiobai- -^^^ parameter Oj, the components z/^ are 
proportional to Mj^ according to Eq. (^). 
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10.29 


1.45 


.00 


.00 


.00 


.00 


-.01 


-.02 


.01 


-.04 






+1.551 - 1.496 


.00 


.01 


.03 


.00 


.00 


.06 


.94 


-.32 




5.379 


0.85 


.00 


.00 


.00 


-.01 


.02 


.06 


-.02 


.10 






+1.02 - .75 


-.02 


-.03 


.01 


.04 


.16 


-.21 


.93 


-.24 




4.498 


1.21 


.00 


.00 


.00 


.00 


.00 


.01 


.00 


.01 






+1.20 - 1.12 


.00 


.01 


.06 


-.04 


-.27 


.94 


-.01 


-.20 



Table 3: Parameters of the global fit. Errors shown are for T = 5, ie., AxgiQ^^^^i = 25. Fixed 

parameters: Af+" = 1.0, 5f'' = 15.0, 5f" = 10.0, and = 0.0 ^ A| irrelevant. The are 
proportional to the Zk{af) = ±tMik of Eq. ([29| ) and are normalized to J^k^l ~ ^■ 
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D Table and Graphs of the Eigenvector sets S 



Table ^ and its continuation Table |^ completely specify the PDF Eigenvector Basis sets 
and by listing all of their parameters at Qq. The notation and the best-fit set are 
specified at the beginning of the table. 

The coefficients listed provide all of the information that is needed for applications. For 
completeness, however, we state here explicitly the connections between these coefficients 
and the constructs that were used elsewhere in the paper to derive them. The fit parameters 
{tti} are related to the tabulated parameters by 

ai=A"" + l, a2 = A2^ as 

a5 = Ai\ ag = ln(l + A;^"), 

ag = Af + 1, aio = v4^, an 

ai3 = ln(l + 4+^), al4 = ln5^/^ ai5 



ln(l + A-; 
= v4f+^ + 1, 

- ^1 ) 



a4 = 
as = 
ai2 



In/, 



ai6 = B, 



gi _ 

A d+u 

d/u 



(33) 



Each of the a^ is thus related to a single PDF parameter, except for ag which is related 
to fg, the momentum fraction carried by gluons, and is thus determined hj Aq, . . . , Af. The 
matrix elements of the transformation from the aj coordinates to the eigenvector coordinates 
are given by 

2 ij 

according to (0), where t = 5 because that value was used to generate the S^. Eqs. (|^) and 
(0) imply 

d 

Ma Mik = Si Sk 5ik . (35) 

i=l 

For i ^ k, this becomes Yl'i=i M^Mik = 0, which can serve as a check on numerical accuracy; 
while for i = k, it becomes Yl'i=i = sj which can be used to reconstruct Si, . . . , s^. 

Finally, for the benefit of the reader who is curious about them, graphs are shown in 
Fig. |ri| of the differences described by each of the PDF eigenvector sets. One sees that the 
steeper directions (small values of t) mainly control aspects of the quark distribution, while 
the shallower directions (high values of t) control the gluon distribution, whose absolute un- 
certainty is larger. The variations in the gluon distribution show less variety than the quarks 
because the gluon distribution is described by only 3 parameters (including normalization), 
such that the most general variation for it is of the form ^ = ci + C2 logx + C3 log(l — x). 
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-0.1941 -0.5337 3.3604 11.8404 0.8552 
—0 7175 —0 5337 4 2296 9 8950 7628 
1.8914 -0.5305 5.5737 0.0000 1.0000 
-1.1174 -1.0092 7.8658 6.6187 1.0000 
10.2940 5.3793 15.0000 10.0000 4.4980 


InA^" Al^ Al^ A\^ 
InA^" Ai'° Ai" Ai" ^f" 

±i± j-iQ r^-^ .f^3 

InAg Af Al Al Al 
In ^q"*"" A'^^^ A^^^ A^^^ A^^^ 
InB^^" Sf" Sf" Sf" Sf" 


-0.1841 -0.5298 3.3596 11.8478 0.8558 
-0.7064 -0.5298 4.2295 9.8957 0.7629 
1.8919 -0.5304 5.5737 0.0000 1.0000 
-1.1342 -1.0089 7.8658 6.6191 1.0000 
10.2940 5.3793 15.0000 10.0000 4.4980 


-0.2042 -0.5376 3.3612 11.8331 0.8546 
-0.7285 -0.5376 4.2297 9.8942 0.7627 
1.8910 -0.5306 5.5737 0.0000 1.0000 
-1.1010 -1.0095 7.8658 6.6182 1.0000 
10.2940 5.3793 15.0000 10.0000 4.4980 


-0.1951 -0.5343 3.3597 11.8372 0.8557 
-0.7184 -0.5343 4.2291 9.8952 0.7636 
1.9025 -0.5321 5.5740 0.0000 1.0000 
-1.1609 -1.0140 7.8662 6.6117 1.0000 
10.2940 5.3794 15.0000 10.0000 4.4980 


-0.1931 -0.5331 3.3611 11.8437 0.8547 
-0.7165 -0.5331 4.2301 9.8947 0.7620 
1.8802 -0.5288 5.5734 0.0000 1.0000 
-1.0751 -1.0043 7.8654 6.6256 1.0000 
10.2940 5.3792 15.0000 10.0000 4.4980 


-0.1981 -0.5338 3.3645 11.8237 0.8501 
-0.7182 -0.5338 4.2287 9.9098 0.7633 
1.9095 -0.5244 5.5730 0.0000 1.0000 
-1.0925 -0.9946 7.8651 6.6340 1.0000 
10.2939 5.3795 15.0000 10.0000 4.4980 


-0.1903 -0.5336 3.3564 11.8567 0.8602 
-0.7167 -0.5336 4.2304 9.8806 0.7623 
1.8738 -0.5364 5.5745 0.0000 1.0000 
-1.1422 -1.0234 7.8665 6.6037 1.0000 
10.2941 5.3791 15.0000 10.0000 4.4980 


-0.1932 -0.5413 3.3436 11.8275 0.8774 
-0.7420 -0.5413 4.2312 9.8814 0.7592 
1.9005 -0.5271 5.5735 0.0000 1.0000 
-1.0798 -0.9990 7.8655 6.6250 1.0000 
10.2964 5.3748 15.0000 10.0000 4.4970 


-0.1960 -0.5260 3.3774 11.8536 0.8327 
-0.6928 -0.5260 4.2279 9.9087 0.7664 
1.8822 -0.5340 5.5739 0.0000 1.0000 
-1.1544 -1.0195 7.8661 6.6122 1.0000 
10.2916 5.3839 15.0000 10.0000 4.4990 


-0.2373 -0.5372 3.3513 12.2488 0.8440 
-0.7278 -0.5372 4.2550 9.5924 0.7435 
1.9071 -0.5244 5.5733 0.0000 1.0000 
-1.1096 -1.0071 7.8668 6.6099 1.0000 
10.2848 5.3943 15.0000 10.0000 4.5012 


-0.1528 -0.5303 3.3692 11.4555 0.8661 
-0.7078 -0.5303 4.2049 10.1974 0.7816 
1.8761 -0.5364 5.5742 0.0000 1.0000 
-1.1246 -1.0112 7.8648 6.6272 1.0000 
10.3029 5.3647 15.0000 10.0000 4.4949 


-0.2370 -0.5419 3.3147 11.7529 0.8399 
-0.6899 -0.5419 4.2039 9.8658 0.8113 
1.9021 -0.5228 5.5734 0.0000 1.0000 
-1.0919 -1.0074 7.8633 6.6099 1.0000 

10.2670 5.4325 15.0000 10.0000 4.5101 


-0.1529 -0.5258 3.4046 11.9257 0.8700 
-0.7483 -0.5258 4.2545 9.9232 0.7159 
1.8811 -0.5379 5.5741 0.0000 1.0000 
-1.1398 -1.0109 7.8682 6.6271 1.0000 
10.3201 5.3279 15.0000 10.0000 4.4863 


-0.2078 -0.5282 3.3181 11.4087 0.8179 
-0.7700 -0.5282 4.2279 10.1816 0.7198 
2.0853 -0.4301 5.5687 0.0000 1.0000 
-1.2032 -1.0484 7.9025 6.3721 1.0000 
10.3046 5.3626 15.0000 10.0000 4.4919 


-0.1833 -0.5383 3.3960 12.2156 0.8866 
-0.6769 -0.5383 4.2310 9.6594 0.7990 
1.7217 -0.6150 5.5779 0.0000 1.0000 
-1.0446 -0.9762 7.8349 6.8326 1.0000 
10.2850 5.3934 15.0000 10.0000 4.5031 


-0.0801 -0.5269 3.4174 11.3977 0.9160 
-0.7805 -0.5269 4.2376 10.2241 0.7090 
1.9359 -0.5086 5.5686 0.0000 1.0000 
-1.1401 -1.0195 7.8611 6.6228 1.0000 
10.2407 5.4629 15.0000 10.0000 4.5154 


-0.3093 -0.5402 3.3060 12.2779 0.7971 
-0.6640 -0.5402 4.2220 9.5897 0.8142 
1.8485 -0.5514 5.5786 0.0000 1.0000 
-1.0838 -0.9994 7.8703 6.6147 1.0000 
10.3449 5.2995 15.0000 10.0000 4.4814 



Table 4: Parameters of the Best Fit Sq and their definitions, followed by Eigenvector Sets 

^1 1 ^1 1 ^2 1 ^2 1 ■ ■ ■■ 
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-0.1659 -0.5391 3.4265 12.5012 0.9121 
-0.6220 -0.5391 4.2087 9.3472 0.8414 
2 0550 —0 4456 5 5690 0000 1 0000 
-1.1277 -1.0312 7.8798 6.3693 1.0000 
10.2997 5.3659 15.0000 10.0000 4.4929 


-0.2243 -0.5283 3.2943 11.2121 0.7983 
-0.8285 -0.5283 4.2505 10.4716 0.6842 
1 7218 —0 6154 5 5785 0000 1 0000 
-1.0969 -0.9873 7.8518 6.8764 1.0000 
10.2883 5.3927 15.0000 10.0000 4.5031 


-0.2209 -0.5385 3.3464 11.9679 0.8505 
-0.7087 -0.5385 4.2773 10.1973 0.7903 
2 0697 —0 4336 5 4996 0000 1 0000 
-1.4530 -1.0911 7.6927 8.5071 1.0000 
10.3080 5.3568 15.0000 10.0000 4.5032 


-0.1581 -0.5272 3.3793 11.6700 0.8616 
-0.7304 -0.5272 4.1650 9.4988 0.7256 
1 6342 —0 661 6 5 6741 0000 1 flOflfl 
-0.6805 -0.8983 8.1000 4.6459 1.0000 
10.2750 5.4097 15.0000 10.0000 4.4910 


0.0212 -0.4998 3.3685 9.5867 0.8619 
-0.3803 -0.4998 4.5680 9.5749 0.9267 

1 8986 —0 5295 5 5981 0000 1 0000 
-1.0511 -0.9997 7.8935 6.2437 1.0000 
10.3431 5.3894 15.0000 10.0000 4.5718 


-0.3963 -0.5645 3.3530 14.3021 0.8491 
-1.0637 -0.5645 3.9221 10.1942 0.6138 
1 8849 —0 5314 5 5516 0000 1 0000 
-1.1519 -1.0178 7.8406 6.9762 1.0000 
10.2494 5.3702 15.0000 10.0000 4.4309 


-0.6200 -0.6029 3.3322 17.6542 0.8617 
-1.0990 -0.6029 4.4299 18.0058 0.8742 
1 9215 —0 5171 5 5927 0000 1 0000 
-0.9608 -0.9876 7.7202 4.9575 1.0000 
10.2885 5.4121 15.0000 10.0000 4.4446 


0.2128 -0.4640 3.3889 7.8121 0.8487 
-0.3418 -0.4640 4.0277 5.2176 0.6505 

1 8610 —0 5440 5 5546 0000 1 0000 
-1.2706 -1.0310 8.0126 8.7623 1.0000 
10.2995 5.3462 15.0000 10.0000 4.5518 


-0.3285 -0.5667 3.3800 14.2366 0.9075 
-0.7740 -0.5667 4.3664 12.6923 0.9007 
1.8124 -0.5621 5.5060 0.0000 1.0000 
-1.3022 -1.0266 8.6732 12.2662 1.0000 
10.2946 5.5039 15.0000 10.0000 4.1962 


-0.0441 -0.4965 3.3383 9.5883 0.7962 
-0.6624 -0.4965 4.0754 7.4209 0.6073 
1.9800 -0.4949 5.6501 0.0000 1.0000 
-0.9707 -0.9896 6.9557 3.0774 1.0000 
10.2934 5.2389 15.0000 10.0000 4.8382 


-0.4030 -0.5716 3.3718 14.9177 0.8788 
-0.9934 -0.5716 4.1229 11.9127 0.7280 
1.9461 -0.5128 5.6623 0.0000 1.0000 
-1.1804 -1.0206 8.1430 8.1232 1.0000 
10.3763 5.1976 15.0000 10.0000 5.6698 


-0.0012 -0.4980 3.3497 9.4869 0.8330 
-0.4696 -0.4980 4.3301 8.2828 0.7956 
1.8398 -0.5471 5.4902 0.0000 1.0000 
-1.0571 -0.9985 7.6046 5.4285 1.0000 
10.2164 5.5506 15.0000 10.0000 3.3936 


-0.2036 -0.5399 3.4062 12.7490 0.8931 
-0.9051 -0.5399 4.0867 9.6572 0.6340 
2.0530 -0.4819 5.8758 0.0000 1.0000 
-1.1784 -1.0238 7.7670 6.9327 1.0000 
11.7435 6.2127 15.0000 10.0000 4.4913 


-0.1860 -0.5282 3.3199 11.0884 0.8217 
-0.5709 -0.5282 4.3557 10.1093 0.8765 

1.7480 -0.5734 5.3071 0.0000 1.0000 
-1.0517 -0.9963 7.9530 6.3518 1.0000 

9.0144 4.6435 15.0000 10.0000 4.5039 


-0.2054 -0.5338 3.3661 12.0453 0.8534 
-0.7110 -0.5338 4.2688 10.4095 0.7879 

3.2798 -0.1380 8.5257 0.0000 1.0000 
-1.7279 -1.1163 7.6289 12.7489 1.0000 

9.7697 5.1479 15.0000 10.0000 4.2287 


-0.1853 -0.5336 3.3560 11.6819 0.8566 
-0.7226 -0.5336 4.1988 9.5075 0.7431 
0.7313 -0.8385 3.2571 0.0000 1.0000 
-0.6889 -0.9252 8.0517 3.7937 1.0000 
10.7054 5.5609 15.0000 10.0000 4.7094 



Table 5: Continuation of Table Ij: parameters of the Eigenvector Sets 8^,8^, . . . , 8 
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Figure 11: Displacements of the w-quark and gluon distributions f^{x,Q) — f {x,Q) at 
Q = 10 GeV corresponding to — S^. Length of dashes increases for £ = 1, ... , 16. 
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